Data Analysis by Python

Data Edit

If there is the data of Sunday in the data set, data analysis ( Multi-Variable Analysis and Graphical Analysis ) for weekday does not go well. In this case, data edit is needed.

For the small data, edit is easy. This page is the methods for the big data.

Make Data of Meta Knowledge might be needed before the method of this page.

Prepare to use Python

Codes in this page start the phase that there is an input data named "df".

If start data is a csv file. Code to make "df" is bellow.

import pandas as pd # Read package
df= pd.read_csv("Data.csv")# Read data

Method to Cut Data

Stratified Sampling

Data Edit
Stratified Sampling is the method to get the data we should analyze.

df[df.C1 == 'A1']

If we need to get the data "Between 3 and 4", code is below.
df[(df.X1 > 3) & (df.X1 <4)]

AND condision --> " & "
OR condision --> " | "
NOT condision --> " ~ "

Decrease Dimension

Data Edit
Principal Component Analysis is one of the method to decrease dimension. But below is the easiest way.

df.X1
or
df.loc[:,['X1']]

Method to Paste Data

Vertical Paste

Data Edit

df3 = pd.concat([df1, df2])

If there are unknown variable for each data set, the data is missing value ("NaN").
Data Edit

On the Side

Data Edit

df3 = pd.concat([df1, df2],axis=1)

For example, the case below occurss.
Data Edit

If A1, A2 and A3 of each data set is needed on the same columns, merge (the medhod below) is better.

Method of Cut and Paste

Merge

Data Edit

This example code merges right data (df2) to the left data (df1). In Excel, same method is VLOOKUP-function.

df3=pd.merge(df1,df2, how='left')

If we do not need "NaN" data, we use "inner" in stead of "left". If we need "NaN" data in left side data, we use "outer" in stead of "left".